[hub] fix kernels fp8#9271
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the transformers dependency version and introduces a _patch_kernels context manager in transformers_engine.py to handle modelscope hub patching during inference. The review feedback points out critical flaws in the context manager's implementation, including a lack of exception safety that could leave the global state inconsistent and potential thread-safety issues when processing concurrent requests.
| def _patch_kernels(self): | ||
| use_hf = self.use_hf | ||
| if use_hf is None: | ||
| use_hf = True if use_hf_hub() else False | ||
| if not use_hf: | ||
| try: | ||
| from modelscope import patch_hub, unpatch_hub | ||
| except ImportError: | ||
| use_hf = True | ||
| if not use_hf: | ||
| patch_hub() | ||
| yield | ||
| if not use_hf: | ||
| unpatch_hub() |
There was a problem hiding this comment.
The _patch_kernels context manager has two significant issues:
- Exception Safety: It lacks a
try...finallyblock around theyield. If an exception occurs during generation (e.g., inself.template.generate),unpatch_hub()will never be called, leaving the global environment in a patched state. - Thread Safety:
patch_hubandunpatch_hubmodify global state inmodelscope/transformers. Since_infer_streamexecutes generation in a separate thread, concurrent requests can lead to race conditions where one thread's cleanup unpatches the hub while another thread is still using it.
Consider using a try...finally block for safety. For thread safety, if this engine is used in a concurrent environment, a global lock and reference counter might be necessary, or simply patching once during initialization if the mode is consistent for the process.
def _patch_kernels(self):
use_hf = self.use_hf if self.use_hf is not None else use_hf_hub()
if use_hf:
yield
return
try:
from modelscope import patch_hub, unpatch_hub
except ImportError:
yield
return
patch_hub()
try:
yield
finally:
unpatch_hub()
No description provided.